feat(mem_cache): Hybrid Memory Pool system (Step 1: RecurrentStatePool)#1031
Closed
JamesBrianD wants to merge 5 commits intosgl-project:mainfrom
Closed
feat(mem_cache): Hybrid Memory Pool system (Step 1: RecurrentStatePool)#1031JamesBrianD wants to merge 5 commits intosgl-project:mainfrom
JamesBrianD wants to merge 5 commits intosgl-project:mainfrom
Conversation
Migrated from epic/support_kimi_linear with DP support added.
Pure buffer pool for linear recurrent layers (KDA/Mamba/GDN).
Key changes vs epic:
- max_num_reqs → size (align with upstream sglang MambaPool)
- dp_size param with slot dim sharded on P("data", ...)
- total_slots = ceil_to(size+1, dp_size) for DP divisibility
3 tasks
Rodrian7
added a commit
to Rodrian7/sglang-jax
that referenced
this pull request
May 7, 2026
The first cut paired the DP-sharded RecurrentStatePool from sgl-project#1031 with a single-list slot allocator copied from epic. With dp_size > 1 the buffer's first dim is sharded along the 'data' axis (each rank physically holds a distinct slot range), so a single global free list would hand out slots that cross DP rank boundaries — read/write at those slots would land in the wrong rank's local buffer view. Switch the allocator to per-DP: maintain one free list per rank with LOCAL indices [1..slots_per_rank], and route alloc/free by req.dp_rank. Callers (prepare_for_extend / decode) iterate per-DP, so all reqs in a single alloc() call share the same dp_rank. Tests updated: dp_size=1 cases unchanged in semantics but now index into recurrent_free_slots[0]. DP test class rewritten with four per-rank cases (init local indexing, alloc routing, capacity miss isolation, free routing). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RecurrentStatePoolwith DP sharding support, migrated fromepic/support_kimi_linearHybridReqToTokenPool(upcoming)max_num_reqs→size(align upstream MambaPool),dp_sizeparam, slot dim sharded onP("data", ...)Test plan
🤖 Generated with Claude Code